Fair AI Preparation Using Super-diversity

Adam M. Slocinski

Open eGovernment program, DSV, Stockholm University

2025-06-04

Seminar Objective

To provide a high-level overview of:

  • Background, problem and research question
  • Method selection, analysis and findings
  • Recommendations and discussion

Machine learning is “the most important general-purpose technology of our era” 1

What is machine learning (ML)?

is a field of artificial intelligence (AI) that enables computers to learn and improve from experience without being explicitly programmed on how or what to learn.

  • ML algorithm learns something from input data, and that something is a mathemtical representation of a problem known as an ML model.
  • ML model development has three stages:
    • pre-processing, in-processing, and post-processing.

Stages of ML model development 2

Stages of ML model development from Kheya et al. (2024)

Limitations

  1. Training data is implicitly or explicitly biased to begin with
  2. Group categorizations require strong definitions and tests to measure fairness treatment
  3. Requires sensitive data, conflicting with privacy and availability

Fair AI Limitations 3

Prototypical Fair AI System from Buyl and De Bie (2024)

Mitigation strategies 4

Taxonomy of mitigation strategies by Kheya et al. (2024)

“these methods have been applied to binary classifications … but are, in theory, extendable” 5

Problem

“Highly Accurate, But Still Discriminatory” 6

“Assessing risk, automating racism” 7

“[T]wo contrasting research paradigms: one rooted in computer science (CS), the origin discipline of fair AI, and another one that is more socially-oriented and interdisciplinary (SOI)” 8

“Equipping practitioners to recognize and address algorithmic bias and fairness debt” and “Improving bias mitigation and ethical design to address fairness debt” 9

Unpacking Super-diversity

‘Super-diversity’ is a term intended to underline a level and kind of complexity surpassing anything previously experienced in a particular society due to global migration patterns. This results in wholly new and complex social formations marked by a dynamic interplay of variables. These variables co-condition integration outcomes. 10

Variables/dimensions:

  • Stratified country of origin traits
  • Migration channel and arrival society
  • Legal status hierarchy

Criticisms

  • Is this a theory, a concept, an approach?
  • Applicable only to urban populations with multiple waves of migrant arrivals?
  • Are dimensions… variables, individual traits, or group categorizations?
  • Is integration a measureable outcome?

Is this a classification problem?

Nowadays however, an increasing number of cities and communities can be characterized as internationalized and super-diverse with no monolithic mainstream society but a multitude of diverse groups … [t]his raises the question of not only who integrates but also “into what?11

Thesis Positioning

  1. This is an approach that can be viewed as a mitigation strategy (i.e., relabelling)
  2. This deals with categorizations for cluster/segmentation analysis
  3. A successful integration outcome means a fairer outcome has been achieved

Research question

How can super-diversity theory be integrated into fair AI system development to better account for the heterogeneous nature of human populations?

Seminar Objective

To provide a high-level overview of:

  • Background, problem and research question
  • Method selection, analysis and findings
  • Recommendations and discussion

Strategy

Hermeneutic framework for the literature review process from Boell and Cecez-Kecmanovic (2014)

Method

Benefits of this framework:

  • Focus on readings, not report selection
  • Allows modifying research problem without feeling guilty (or like a failure)
  • Argument develops throughout the selection process, not after the selection is completed
  • Provides a method for interpretative analysis…

Limitation:

  • Study becomes largely a matter subject to interpretation

Thematic analysis coding

Thematic analysis process sample

Findings

  • AI or ML is not mentioned anywhere in super-diversity literature
  • Super-diversity, in its many forms, is not mentioned in fair AI literature
  • Support from another domain or interdisciplinary help is found in both
  • Hard to tag requirements when ambiguity exists for what super-diversity dimensions/variables are supposed to be
  • Hard to tag requirements when in-processing mitigation strategies are the dominant approach
  • Integration, or mitigating against social exclusion, is not explicitly the goal of fair AI

Noteable

  • Pan-ethnicity or mixed categorizations are mentioned
  • Traits are mostly binary and acknowledged that intersecting traits in theory can propogate throughout the model
  • Proactive publication, curation, generation, preparation, and sharing of fair datasets is mentioned
  • More solutions or use-cases found in conference literature

Seminar Objective

To provide a high-level overview of:

  • Background, problem and research question
  • Method selection, analysis and findings
  • Recommendations and discussion

We have an opportunity through the convergence of social and artificial sciences to enhance integration outcomes

Recommendation

  • The focus on the recommendation was to provide a guidance document for source data owners.
  • Play the role of the human-in-the-loop (HiL) for the pre-processing phase of ML model development.
  • Supplant AI developer’s role in data collection, preparation and feature engineering (for said dataset).
  • Take on accountability for dataset preparation for ML use.

Recommendation

  1. Be proactive in disclosing bias and disproportionality in the raw dataset, akin to a nutrition food label.
  2. Use a documentation framework to document the choices in category selection for protected attributes, akin to a checklist
  3. Normalize the intersecting selected protected attributes, akin to superdiversity map method

Preparation Guideline

Limitations

  • Did the thesis answer how this can be done?
  • Falls short on operationalizing the application of super-diversity in fair AI
  • Literature review was done manually, without collaborators, using Word and Excel
  • Fairness is yet a standard

Future research

  • Can be part of the extended background to a DSR project
  • Survey
    • To determine requirements for embedding super-diversity variables into ML model
  • Use-case
    • Scan existing datasets on urban populations in global north and global south and apply guidelines

Discussion

The main contribution of this thesis is that it shed light on a topical, compelling problem space.

  • Can add to discussion on fairness regulation or education
  • Relates to a transition from data-driven systems to values-driven systems that:
    • Can empathize with us
    • Seek cooperation and co-existence
    • Are representative of the population it is designed to serve

Thank you

Footnotes

  1. Kochling et al., 2020

  2. (Kochling et al., 2020)

  3. (Benjamin, 2019)

  4. (Fahimi et al., 2024)

  5. (de Souza Santos, 2024)

  6. Max Planck Institute for the Study of Religious and Ethnic Diversity

  7. Reimagining “Integration” in the Light of the New Forms of Mobility